COVSMA stands for Copernicus Satellites Versus Maladies: The current sanitary crisis generates the necessity to develop an online tool to monitor pollution levels, display alerts that will imply that governments automatically take measures: days without cars and trucks that are not 100% Electrical, monitor the live impact of the measures taken, forecast COVID19 risk due to pm2.5 exposure for up to 4 days in the future, and predict new hospitalisations due to severe COVID19 cases for all states/departements. We have named this tool with the analog name of COVSCO (Copernicus Satellites Versus COVID19). We start with France and its 96 departements. A follow up will be to apply the same methodology to severe respiratory diseases and to expand the model and databases to a global scale.
/home/ludo915/anaconda3/envs/covsco/lib/python3.8/site-packages/tpot/builtins/__init__.py:36: UserWarning: Warning: optional dependency `torch` is not available. - skipping import of NN models.
warnings.warn("Warning: optional dependency `torch` is not available. - skipping import of NN models.")
nom numero time hospi reanim newhospi newreanim \
34 Ain 1.0 2020-05-14 137.0 8.0 4.0 0.0
35 Ain 1.0 2020-05-15 135.0 7.0 4.0 0.0
36 Ain 1.0 2020-05-16 134.0 6.0 1.0 0.0
37 Ain 1.0 2020-05-17 133.0 6.0 1.0 0.0
38 Ain 1.0 2020-05-18 132.0 6.0 1.0 0.0
... ... ... ... ... ... ... ...
34171 Val-d'Oise 95.0 2021-03-27 659.0 79.0 43.0 7.0
34172 Val-d'Oise 95.0 2021-03-28 667.0 81.0 38.0 7.0
34173 Val-d'Oise 95.0 2021-03-29 680.0 76.0 44.0 4.0
34174 Val-d'Oise 95.0 2021-03-30 688.0 75.0 88.0 5.0
34175 Val-d'Oise 95.0 2021-03-31 698.0 77.0 83.0 7.0
deces gueris dep_num ... Region_y Departement_y \
34 88.0 318.0 1.0 ... Auvergne-Rhône-Alpes Ain
35 89.0 323.0 1.0 ... Auvergne-Rhône-Alpes Ain
36 90.0 325.0 1.0 ... Auvergne-Rhône-Alpes Ain
37 90.0 326.0 1.0 ... Auvergne-Rhône-Alpes Ain
38 90.0 331.0 1.0 ... Auvergne-Rhône-Alpes Ain
... ... ... ... ... ... ...
34171 1603.0 6857.0 95.0 ... Île-de-France Val-d'Oise
34172 1606.0 6882.0 95.0 ... Île-de-France Val-d'Oise
34173 1615.0 6902.0 95.0 ... Île-de-France Val-d'Oise
34174 1632.0 6964.0 95.0 ... Île-de-France Val-d'Oise
34175 1639.0 7024.0 95.0 ... Île-de-France Val-d'Oise
depnum_y Smokers Nb_susp_501Y_V1 Nb_susp_501Y_V2_3 minority \
34 1 0.262 0 0 54821.0
35 1 0.262 0 0 54821.0
36 1 0.262 0 0 54821.0
37 1 0.262 0 0 54821.0
38 1 0.262 0 0 54821.0
... ... ... ... ... ...
34171 95 0.213 9561 432 161947.0
34172 95 0.213 9557 428 161947.0
34173 95 0.213 9496 396 161947.0
34174 95 0.213 8643 364 161947.0
34175 95 0.213 7915 332 161947.0
pauvrete rsa ouvriers
34 10.7 2.3 17.74
35 10.7 2.3 17.74
36 10.7 2.3 17.74
37 10.7 2.3 17.74
38 10.7 2.3 17.74
... ... ... ...
34171 16.8 5.8 17.63
34172 16.8 5.8 17.63
34173 16.8 5.8 17.63
34174 16.8 5.8 17.63
34175 16.8 5.8 17.63
[30912 rows x 92 columns]
| idx | pm25 | no2 | o3 | pm10 | co | pm257davg | no27davg | o37davg | co7davg | ... | normno27davg | normo37davg | normpm107davg | normco7davg | normpm251Mavg | normno21Mavg | normo31Mavg | normpm101Mavg | normco1Mavg | newhospi | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 34 | 631877.0 | 6.403166 | 3.939003 | 43.262828 | 7.967417 | 177.622754 | 7.257504 | 3.476745 | 72.813923 | 161.817954 | ... | 0.046602 | 0.586532 | 0.129071 | 0.234867 | 0.108579 | 0.048632 | 0.559299 | 0.094840 | 0.218359 | 4.0 |
| 35 | 631877.0 | 10.041256 | 4.039703 | 45.365958 | 12.985050 | 177.204810 | 7.283296 | 3.523846 | 71.713745 | 162.361518 | ... | 0.047305 | 0.577448 | 0.127923 | 0.236187 | 0.119927 | 0.051150 | 0.496692 | 0.102941 | 0.226082 | 4.0 |
| 36 | 631877.0 | 8.650893 | 2.993409 | 64.447998 | 10.213892 | 173.986833 | 7.222348 | 3.502823 | 70.928693 | 162.525890 | ... | 0.046991 | 0.570966 | 0.125429 | 0.236587 | 0.129652 | 0.051861 | 0.455272 | 0.105411 | 0.234060 | 1.0 |
| 37 | 631877.0 | 7.924968 | 2.470320 | 81.736362 | 11.228378 | 163.052671 | 7.159819 | 3.487527 | 70.520307 | 162.329540 | ... | 0.046763 | 0.567594 | 0.123235 | 0.236110 | 0.127766 | 0.051226 | 0.458184 | 0.101608 | 0.239713 | 1.0 |
| 38 | 631877.0 | 8.803713 | 2.883282 | 79.918855 | 11.338718 | 186.330957 | 7.143561 | 3.477693 | 69.999498 | 162.848931 | ... | 0.046616 | 0.563293 | 0.120713 | 0.237371 | 0.142927 | 0.049201 | 0.477197 | 0.113024 | 0.255846 | 1.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 34171 | 1215390.0 | 6.100140 | 6.627001 | 70.229467 | 11.613934 | 185.482428 | 14.580851 | 14.493739 | 56.915039 | 207.123258 | ... | 0.211142 | 0.455257 | 0.281847 | 0.344925 | 0.302061 | 0.237179 | 0.471712 | 0.271828 | 0.352307 | 43.0 |
| 34172 | 1215390.0 | 8.425896 | 9.099569 | 68.209629 | 13.743489 | 193.139682 | 14.195306 | 14.223872 | 57.801110 | 206.960739 | ... | 0.207112 | 0.462573 | 0.269762 | 0.344530 | 0.289930 | 0.240049 | 0.480526 | 0.265162 | 0.358714 | 38.0 |
| 34173 | 1215390.0 | 13.033086 | 29.526526 | 54.441418 | 18.517216 | 237.388093 | 14.249303 | 14.775230 | 57.704602 | 207.544318 | ... | 0.215346 | 0.461776 | 0.270202 | 0.345948 | 0.295797 | 0.280570 | 0.463981 | 0.266083 | 0.371198 | 44.0 |
| 34174 | 1215390.0 | 21.792762 | 51.437356 | 40.131381 | 30.011262 | 307.382279 | 14.525749 | 16.057324 | 57.291931 | 210.476992 | ... | 0.234495 | 0.458369 | 0.274637 | 0.353072 | 0.286940 | 0.319473 | 0.461757 | 0.261254 | 0.379313 | 88.0 |
| 34175 | 1215390.0 | 21.862512 | 49.729555 | 42.796275 | 32.744934 | 290.497534 | 14.654314 | 17.360417 | 57.019812 | 213.319222 | ... | 0.253957 | 0.456122 | 0.278918 | 0.359977 | 0.278900 | 0.366677 | 0.452311 | 0.260245 | 0.393208 | 83.0 |
30912 rows × 56 columns
Index(['idx', 'pm25', 'pm257davg', 'normpm25', 'hospiprevday',
'covidpostestprevday', 'prevdaytotalcovidcasescumulated',
'all_day_bing_tiles_visited_relative_change',
'all_day_ratio_single_tile_users', 'vac1nb', 'vac2nb',
'Insuffisance respiratoire chronique grave (ALD14)',
'Insuffisance cardiaque grave, troubles du rythme graves, cardiopathies valvulaires graves, cardiopathies congénitales graves (ALD5)',
'Smokers', 'minority', 'Nb_susp_501Y_V1', 'Nb_susp_501Y_V2_3',
'1MMaxpm25', 'pm251Mavg', 'pauvrete', 'rsa', 'ouvriers'],
dtype='object')
22
The daily number of new hospitalizations due to severe COVID19 cases for every French departement is what we will predict.
Text(0, 0.5, 'newhospimean')
Text(0, 0.5, 'newhospimean')
Text(0, 0.5, 'newhospimean')
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
[95.]
Département Numéro Date of pollution peak 1MMaxo3 \
0 Val-d'Oise 95.0 2020-08-09 122.889769
1 Alpes-de-Haute-Provence 4.0 2020-08-08 119.833327
2 Haut-Rhin 68.0 2020-07-31 116.979563
3 Moselle 57.0 2020-08-10 115.873423
4 Ain 1.0 2020-09-18 114.506434
.. ... ... ... ...
90 Pyrénées-Atlantiques 64.0 2020-08-06 94.093549
91 Gard 30.0 2020-05-20 93.974202
92 Cher 18.0 2020-07-13 93.921768
93 Hautes-Pyrénées 65.0 2020-05-30 93.086194
94 Ariège 9.0 2020-05-29 90.297432
totalcovidcasescumulated Population Index
0 237793 1215390.0
1 20858 161799.0
2 79025 762607.0
3 141705 1044486.0
4 109223 631877.0
.. ... ...
90 62382 670032.0
91 107351 738189.0
92 32904 308992.0
93 25307 228582.0
94 14118 152499.0
[95 rows x 6 columns]
[59.]
Département Numéro Date of pollution peak 1MMaxpm25 \
0 Nord 59.0 2020-11-27 39.932960
1 Haut-Rhin 68.0 2021-02-24 37.243984
2 Deux-Sèvres 79.0 2021-03-09 36.380767
3 Paris 75.0 2021-01-02 35.418335
4 Vienne 86.0 2021-03-09 34.895373
.. ... ... ... ...
90 Alpes-Maritimes 6.0 2021-03-06 20.296409
91 Côte-d'Or 21.0 2021-02-23 20.115372
92 Lozère 48.0 2021-03-04 19.935369
93 Alpes-de-Haute-Provence 4.0 2021-02-23 19.765781
94 Ardèche 7.0 2021-03-04 19.407017
totalcovidcasescumulated Population Index
0 477936 2605238.0
1 79025 762607.0
2 35767 374435.0
3 404122 2206488.0
4 38030 434887.0
.. ... ...
90 219293 1082440.0
91 66868 533147.0
92 10367 76309.0
93 20858 161799.0
94 45658 324209.0
[95 rows x 6 columns]
[75.]
Département Numéro Date of pollution peak 1MMaxno2 \
0 Paris 75.0 2021-03-31 67.312539
1 Hauts-de-Seine 92.0 2021-03-02 64.475306
2 Val-de-Marne 94.0 2021-01-08 53.384538
3 Val-d'Oise 95.0 2021-03-02 51.599818
4 Yvelines 78.0 2020-11-26 43.024886
.. ... ... ... ...
90 Lozère 48.0 2021-01-07 7.727260
91 Corse-du-Sud 201.0 2020-12-09 6.285247
92 Ariège 9.0 2021-01-09 6.254226
93 Pyrénées-Orientales 66.0 2021-01-09 6.071631
94 Haute-Corse 202.0 2021-01-09 5.592579
totalcovidcasescumulated Population Index
0 404122 2206488.0
1 266670 1601569.0
2 265897 1372389.0
3 237793 1215390.0
4 212766 1427291.0
.. ... ...
90 10367 76309.0
91 11183 152730.0
92 14118 152499.0
93 45028 471038.0
94 13884 174553.0
[95 rows x 6 columns]
[92.]
Département Numéro Date of pollution peak 1MMaxco \
0 Hauts-de-Seine 92.0 2020-11-26 476.783872
1 Bas-Rhin 67.0 2020-11-10 442.772829
2 Paris 75.0 2020-11-26 442.031472
3 Val-de-Marne 94.0 2021-01-02 400.354289
4 Bouches-du-Rhône 13.0 2021-02-24 364.207868
.. ... ... ... ...
90 Cantal 15.0 2021-01-06 204.699850
91 Lozère 48.0 2021-01-07 202.482931
92 Hautes-Pyrénées 65.0 2021-01-10 200.001928
93 Ariège 9.0 2021-01-11 189.451920
94 Pyrénées-Orientales 66.0 2021-03-06 182.390438
totalcovidcasescumulated Population Index
0 266670 1601569.0
1 138675 1116658.0
2 404122 2206488.0
3 265897 1372389.0
4 404460 2016622.0
.. ... ...
90 11762 146219.0
91 10367 76309.0
92 25307 228582.0
93 14118 152499.0
94 45028 471038.0
[95 rows x 6 columns]
[67.]
Département Numéro Date of pollution peak 1MMaxpm10 \
0 Bas-Rhin 67.0 2021-02-25 74.188288
1 Haut-Rhin 68.0 2021-02-25 71.831104
2 Corse-du-Sud 201.0 2021-02-06 70.996064
3 Vosges 88.0 2021-02-25 70.504318
4 Haute-Saône 70.0 2021-02-25 69.817902
.. ... ... ... ...
90 Mayenne 53.0 2021-03-03 40.262764
91 Eure 27.0 2021-03-03 39.774691
92 Calvados 14.0 2021-03-02 38.762629
93 Sarthe 72.0 2021-03-03 36.942586
94 Orne 61.0 2021-03-02 36.142811
totalcovidcasescumulated Population Index
0 138675 1116658.0
1 79025 762607.0
2 11183 152730.0
3 41055 372016.0
4 28170 237706.0
.. ... ...
90 29192 307940.0
91 65397 601948.0
92 61873 693579.0
93 57288 568445.0
94 29094 286618.0
[95 rows x 6 columns]
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
<AxesSubplot:>
Gradient Boosting for regression.
GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.
Stack of estimators with a final regressor.
Stacked generalization consists in stacking the output of individual estimator and use a regressor to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.
Note that estimators_ are fitted on the full X while finalestimator is trained using cross-validated predictions of the base estimators using cross_val_predict.
T-Pot exported current best pipeline MSE: 51.92040608159389 MAE: 3.572805773399538 (30912, 1) (30912, 92)
Scikit Learn - GradientBoostingRegressor:
index feature_importance
9 vac1nb 0.001813
16 Nb_susp_501Y_V2_3 0.001920
19 pauvrete 0.002501
13 Smokers 0.002902
10 vac2nb 0.002929
1 pm25 0.002964
20 rsa 0.003047
3 normpm25 0.003323
21 ouvriers 0.003725
11 Insuffisance respiratoire chronique grave (ALD14) 0.003786
15 Nb_susp_501Y_V1 0.004309
12 Insuffisance cardiaque grave, troubles du ryth... 0.004505
14 minority 0.004867
0 idx 0.005109
17 1MMaxpm25 0.006621
18 pm251Mavg 0.007130
2 pm257davg 0.009068
7 all_day_bing_tiles_visited_relative_change 0.011923
8 all_day_ratio_single_tile_users 0.039006
6 prevdaytotalcovidcasescumulated 0.122313
5 covidpostestprevday 0.274968
4 hospiprevday 0.481272
<Figure size 900x600 with 0 Axes>
TPOTRegressor
Version 0.11.6.post1 of tpot is outdated. Version 0.11.7 was released Wednesday January 06, 2021.
TPOT closed during evaluation in one generation. WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation. TPOT closed prematurely. Will use the current best pipeline. Best pipeline: ExtraTreesRegressor(CombineDFs(input_matrix, input_matrix), bootstrap=False, max_features=0.5, min_samples_leaf=1, min_samples_split=20, n_estimators=100) -48.56940976377202
The elbow method determines that the optimal number of clusters for PM2.5 Levels is k = 4
0.6375679675824064
39.9329597660099
[0.6375679675824064, 10.46141591718928, 20.285263866796154, 30.10911181640303, 39.9329597660099]
nom pm25levelstring
4983 Nord High
5695 Pas-de-Calais High
6051 Somme High
31683 Paris High
33107 Hauts-de-Seine High
34175 Val-d'Oise High
OK
<seaborn.axisgrid.PairGrid at 0x7fb105e6f2b0>
Part of our product is dedicated to predicting the daily new derpartmental number of hospitalizations due to severe cases of COVID19, these predictions are made by a state of the art Machine Learning model, fine tuned by an Auto Machine Learning optimizer, and are necessary for hospitals and clinics to potentially organize emergency outflows of patients to other locations in case of over-crowding.
The virus' contagious characteristic and facebook's mobility index lead our model's feature importance report but data visualizationg makes it clear to the eye that the mean of all french departments' new hospitalizations due to severe COVID-19 cases is an increasing function of PM2.5-1-M Maximum and PM10 7-day average differentials. Furthermore, unusually high levels in Ozone at ground level seem to act as a trigger to the epidemy.
Our model makes predictions based on live data flows, composed of a set of 22 features, some such as ground level atmospheric pollutant concentrations, others reflecting prevalence/incidence of variants, others the impact of the vaccination campaign and soon temperature and humidity data.
Our algorithm is first trained on a maximum depth historical database obtained by merging features streamed from a multitude of data sources/providers in our database. The training of the model will frequently be launched with the goal that the algorithm continues learning from new data. An API will load the latest model and feed directly all the necessary outputs, including the integration of GIS elements, to the company's product website.
Our tool also ranks French departments by their pollution levels and gives alerts when PM2.5, PM10 or/and other pollutants levels are abnormally high, determining an optimal number of levels is with the K-Means clustering elbow method. These alerts are translated into recommendations with the goal that the government automatically takes measures to stop heavy traffic pollution coming from non electrical cars and trucks until the monitoring team determines that the live levels in pollutants have lowered to a safe cluster interval.
Data pre-processing:
Population ... OK
Covid ... numero hospi reanim newhospi newreanim deces \
0 1.0 191.000000 21.428571 12.571429 2.857143 574.142857
1 2.0 312.285714 54.857143 22.285714 3.571429 944.142857
2 3.0 90.000000 17.428571 8.571429 1.000000 523.142857
3 4.0 150.142857 6.285714 6.571429 0.428571 229.000000
4 5.0 120.571429 12.428571 5.000000 0.714286 237.285714
.. ... ... ... ... ... ...
96 971.0 100.428571 22.142857 10.714286 2.428571 213.142857
97 972.0 113.857143 25.428571 11.142857 3.714286 65.714286
98 973.0 41.428571 11.857143 5.571429 1.285714 91.000000
99 974.0 149.571429 39.000000 10.142857 3.000000 146.142857
100 976.0 13.285714 6.000000 0.000000 0.142857 124.000000
gueris
0 2454.142857
1 3483.285714
2 1824.714286
3 956.142857
4 1042.428571
.. ...
96 982.857143
97 540.571429
98 2295.000000
99 1307.571429
100 1225.428571
[101 rows x 7 columns]
reg dep com article com_nom lon lat \
0 11 75 56 NaN PARIS 2.352222 48.856614
1 11 77 1 NaN ACHERES-LA-FORET 2.570289 48.354976
2 11 77 10 NaN AUBEPIERRE-OZOUER-LE-REPOS 2.890552 48.632323
3 11 77 100 LE CHATELET-EN-BRIE 2.792095 48.504945
4 11 77 101 NaN CHATENAY-SUR-SEINE 3.096229 48.418774
... ... ... ... ... ... ... ...
36313 94 202 9 NaN ALERIA 9.512429 42.104248
36314 94 202 93 NaN CORBARA 8.907482 42.615508
36315 94 202 95 NaN CORSCIA 9.042592 42.354646
36316 94 202 96 NaN CORTE 9.149022 42.309409
36317 94 202 97 NaN COSTA 9.001945 42.574916
total idx hospi
0 2.240213e+06 1 1717.142857142857
1 1.285127e+03 1 714.2857142857143
2 8.993365e+02 1 714.2857142857143
3 4.454928e+03 1 714.2857142857143
4 8.795762e+02 1 714.2857142857143
... ... ... ...
36313 2.005203e+03 0.134286 26.857142857142858
36314 1.002777e+03 0.134286 26.857142857142858
36315 1.833197e+02 0.134286 26.857142857142858
36316 6.756341e+03 0.134286 26.857142857142858
36317 6.962550e+01 0.134286 26.857142857142858
[36318 rows x 10 columns]
OK
PM2.5 ... OK
<xarray.DataArray 'pm2p5_conc' (com: 36318)>
array([15.13080737, 11.59004665, 13.34271505, ..., 4.27186682,
4.55498089, 4.32766635])
Coordinates:
time object ...
longitude (com) float64 2.352 2.57 2.891 2.792 ... 8.907 9.043 9.149 9.002
latitude (com) float64 48.86 48.35 48.63 48.5 ... 42.62 42.35 42.31 42.57
* com (com) int64 0 1 2 3 4 5 6 ... 36312 36313 36314 36315 36316 36317
2.473360196194873
<ipython-input-38-8e33d2a1f13e>:216: DeprecationWarning: The background_patch property is deprecated. Use GeoAxes.patch instead. ax1.background_patch.set_fill(False) <ipython-input-38-8e33d2a1f13e>:222: DeprecationWarning: The outline_patch property is deprecated. Use GeoAxes.spines['geo'] or the default Axes properties instead. a.outline_patch.set_linewidth(0.)
<ipython-input-160-c7a7d55f1956>:18: DeprecationWarning: The background_patch property is deprecated. Use GeoAxes.patch instead. ax.background_patch.set_fill(False) <ipython-input-160-c7a7d55f1956>:23: DeprecationWarning: The outline_patch property is deprecated. Use GeoAxes.spines['geo'] or the default Axes properties instead. ax.outline_patch.set_linewidth(0.)
<matplotlib.axes._subplots.AxesSubplot at 0x7ff5c35a0490>